Method and apparatus for generating compact transcoding hints metadata

ABSTRACT

An audio/video (or audiovisual, “A/V”) signal processing apparatus and method for extracting a compact representation of a multimedia description and transcoding hints metadata for transcoding between different (e.g., MPEG) compressed content representations, manipulating (e.g., MPEG compressed) bitstream parameters such as frame rate, bit rate, session size, quantization parameters, and picture coding type structure (e.g., group of pictures, or “GOP”), classifying A/V content, and retrieving multimedia information.

TECHNICAL FIELD

The present invention relates to an audio/video (or audiovisual, “A/V”)signal processing method and an A/V signal processing apparatus forextracting a compact representation of a multimedia description andtranscoding hints metadata for transcoding between different (e.g.,MPEG) compressed content representations, manipulating (e.g., MPEGcompressed) bitstream parameters such as frame rate, bit rate, sessionsize, quantization parameters, and picture coding type structure, suchas group of pictures, or “GOP”, classifying A/V content, and retrievingmultimedia information.

BACKGROUND ART

A/V content is increasingly being transmitted over optical, wireless,and wired networks. Since these networks are characterized by differentnetwork bandwidth constraints, there is a need to represent A/V contentby different bit rates resulting in varying subjective visual quality.Additional requirements on the compressed representation of A/V contentare imposed by the screen size, computational capabilities, and memoryconstraints of an A/V terminal.

Therefore, A/V content stored in a compressed format, e.g., as definedby Moving Pictures Experts Group (“MPEG”), must be converted to, e.g.,different bit rates, frame rates, screen sizes, and in accordance withvarying decoding complexities and memory constraints of different A/Vterminals.

To avoid the need for storing multiple compressed representations of thesame A/V content for different network bandwidths and different A/Vterminals, A/V content stored in a compressed MPEG format may betranscoded to a different MPEG format.

With respect to video transcoding, reference is made to the following:

-   -   W009838800A1: O. H. Werner, N. D. Wells, M. J. Knee: Digital        Compression Encoding with improved quantization, 1999, proposes        an adaptive quantization scheme;    -   U.S. Pat. No. 5,870,146: Zhu; Qin-Fan: Device and method for        digital video transcoding, 1999;    -   W009929113A1: Nilsson, Michael, Erling; Ghanbari, Mohammed:        Transcoding, 1999;    -   U.S. Pat. No. 5,805,224: Keesman; Gerrit J, Van Otterloo; Petrus        J.: Method and Device for Transcoding Video Signal, 1998;    -   W009943162A L Golin, Stuart, Jay: Motion vector extrapolation        for transcoding video sequences, 1999;    -   U.S. Pat. No. 5,838,664: Polomski; Mark D.: Video        teleconferencing system with digital transcoding, 1998;    -   W009957673A2: Balliol, Nicolas: Transcoding of a data stream,        1999;    -   U.S. Pat. No. 5,808,570: Bakhmutsky; Michael: Device and Method        for pair-matching Huffman-Transcoding and high performance        variable length decoder with two-word bitstream segmentation        which utilizes the same, 1998;    -   W009905870A2: Lemaguet, Yann: Method of Switching between Video        Sequences and corresponding Device, 1999; and    -   W009923560A1: LUDWIG, Lester; BROWN, William; Y U L, Inn, J.;        VUONG, Anh, T., VANDERLIPPE, Richard; BURNETT, Gerald; LAUWERS,        Chris; L U I, Richard; APPLEBAUM, Daniel: Scalable networked        multimedia system and application, 1999.

However, none of these patents on video transcoding disclose or suggestusing transcoding hints metadata information to facilitate A/Vtranscoding.

The Society of Motion Picture and Television (“SMPTE”) proposed astandard for Television on MPEG-2 Video Recoding Data Set (327M-2000),which provides for re-encoding metadata using 256 bits for everymacroblock of the source format. However, this extraction andrepresentation of transcoding hints metadata has several disadvantages.For example, according to the proposed standard, transcoding hintsmetadata (such as GOP structure, quantizer settings, motion vectors,etc.) is extracted for every single frame and macroblock of the A/Vsource content. This method offers the advantage of offering detailedand content adaptive transcoding hints and facilitates transcoding whilewidely preserving The subjective A/V duality. However, the size of thetranscoding hints metadata is very large. In one specific implementationof the proposed standard, 256 bits of transcoding hints metadata arestored per macroblock of MPEG video. This large amount of transcodinghints metadata is not feasible for, say, broadcast distribution to alocal (e.g., home) A/V content server. Consequently, the proposedstandard on transcoding hints metadata is limited to broadcast studioapplications.

Another technique for transcoding hints metadata extraction andrepresentation includes collecting general transcoding hints metadatafor the transcoding of compressed A/V source content with a specific bitrate to another compressed format and bit rate. However, this techniqueis disadvantageous in not taking the characteristic properties of thetranscoded content into account. For example, in the source content, theA/V characteristics may change from an A/V segment with limited amountof motion and few details (e.g., a news anchor scene) to another A/Vsegment depicting fast motion and numerous details (e.g., a sports eventscene). According to this technique, misleading transcoding hintsmetadata, which would not suitably represent the differentcharacteristics of both video segments, would be selected and,therefore, result in poor A/V quality and faulty bit rate allocation.

DISCLOSURE OF THE INVENTION

In view of the foregoing, it is an object of the present invention toprovide a method and apparatus for extracting a compact and A/V-contentadaptive multimedia description and transcoding hints metadatarepresentation.

It is another object of the invention to provide a transcoding methodand apparatus that allow for real-time execution without significantdelay and inhibitive computational complexity one of the requirementsfor a transcoding method. A second requirement for a transcoding methodis to preserve the subjective A/V quality as much as possible. Tofacilitate a transcoding method that fulfills both of these requirementsfor various compressed target formats, transcoding hints metadata may begenerated in advance and stored separately or together with thecompressed A/V content. It is a further object of this invention toprovide a highly compact representation to reduce storage size and tofacilitate distribution (e.g., broadcast to local A/V content server) ofmultimedia description and transcoding hints metadata.

It is, thus, an object of the invention to provide a transcoding systemthat: 1) preserves the A/V quality through the transcoding process, and2) limits the computational complexity in order to enable real-timeapplications with minimal delay. In accordance with an embodiment of theinvention, additional data (metadata) covering transcoding hints may beassociated to the compressed A/V content.

Other objects and advantages of the invention will in part be obviousand will in part be apparent from the specification and the drawings.

The present invention is directed to an apparatus and method thatprovides automatic transcoding hints metadata extraction and compactrepresentation.

The present invention is in the field of transcoding compressed A/Vcontent from one compressed format into A/V content of another format byusing supporting transcoding metadata. The term transcoding includes,but is not limited to changing the compressed format (e.g. conversionfrom MPEG-2 format to MPEG-4 format), frame-rate conversion, bitrate-conversion, session-size conversion, screen-size conversion,picture coding type conversions, etc.

The present invention may also be applied to automatic videoclassification using the aforementioned transcoding hints states asclasses of different scene activity in video.

The invention accordingly comprises the several steps and the relationof one or more of such steps with respect to each of the others, and theapparatus embodying features of construction, combination(s) of elementsand arrangement of parts that are adapted to effect such steps, all asexemplified in the following detailed disclosure, and the scope of theinvention will be indicated in the claims.

BRIEF DESCRIPTION OF THE DRAWINGS

For a more complete understanding of the invention, reference is made tothe following description and accompanying drawing(s), in which:

FIG. 1 depicts a system overview of a transcoding system in a homenetwork with various A/V terminals in accordance with an embodiment ofthe invention;

FIG. 2 illustrates the transcoding hints extraction (Group of Pictures,“GOP”) in accordance with an embodiment of the invention;

FIG. 3 illustrates an example for the selection of transcoding statesdepending on the number of new feature points per frame according to anembodiment of the invention;

FIG. 4 shows an example of a transcoding hints state diagram with 3states according to an embodiment of the invention;

FIG. 5 illustrates the transcoding hints metadata extraction fromcompressed and uncompressed source content in accordance with anembodiment of the invention;

FIG. 6 shows a video segmentation and transcoding hints state selectionprocess in accordance with an embodiment of the invention;

FIG. 7 shows a method of determining the boundaries of a new videosegment (or new GOP) in accordance with an embodiment of the invention;

FIG. 8 shows an algorithm on how to select the transcoding hints statein accordance with an embodiment of the invention;

FIG. 9 provides an overview of a structural organization of transcodinghints metadata in accordance with an embodiment of the invention;

FIG. 10 depicts a structural organization of a general transcoding hintsmetadata description scheme according to an embodiment of the invention;

FIG. 11 depicts the transcoding hints metadata for source formatdefinition according to an embodiment of the invention;

FIG. 12 depicts the. transcoding hints metadata for target formatdefinition according to an embodiment of the invention;

FIG. 13 depicts the general transcoding hints metadata representationaccording to an embodiment of the invention;

FIG. 14 depicts the segment-based transcoding hints metadatarepresentation according to an embodiment of the invention;

FIG. 15 depicts the encoding complexity transcoding hints metadataaccording to an embodiment of the invention; and

FIG. 16 depicts the transcoding hints state metadata according to anembodiment of the invention.

BEST MODE FOR CARRYING OUT THE INVENTION

FIG. 1 depicts a general overview on a system 100 for transcoding in ahome network environment in accordance with an embodiment of theinvention. As shown in FIG. 1, an A/V content server 102 includes an A/Vcontent storage 103, an A/V transcoding unit 106, a transcoding hintsmetadata extraction unit 104, and an A/V transcoding hints metadatastorage buffer 105. A/V content storage 103 stores compressed A/Vmaterial from various sources with varying bit rate and varyingsubjective quality. For example, A/V content storage 103 may containhome video from a portable Digital Video (“DV”) video camera 111, MPEG-4compressed video with a very low bit rates (of say 10 kbit/s) from anMPEG-4 Internet camera 112, and MPEG-2 Main Profile at Main Level(“MP@ML”) compressed broadcast video of around 5 Mbit/s from a broadcastservice 101, which is in some cases already associated with transcodinghints metadata. A/V content server 102 may also contain high definitioncompressed MPEG video at considerably higher bit rates.

As shown in FIG. 1, A/V content server 102 is connected to a network113, which may be a wire-based or wireless home network. Several A/Vterminals with different characteristics may also be attached to network113, including, but not limited to: a wireless MPEG-4 A/V personaldigital assistant (“PDA”) 107, a high resolution A/V terminal for highdefinition television entertainment 108, an A/V game console 109, and anInternational Telecommunications Union Technical Standards Group(“ITU-T”) based videophone 110. The A/V terminals 107, 108, 109, and 110may be attached with different bit rate transmission capabilities (dueto cable or radio link) to home network 113.

Furthermore, wireless video PDA 107, for example, may be limited interms of computational power, storage memory, screen size, video framerate, and network bit rate. Therefore, A/V transcoding unit 106 maytranscode, for example, 5 Mbit/s MPEG-2 broadcast video at European 25frames per second (“fps”) and 720×480 pel contained in A/V contentserver 102 to an MPEG-4 500 kbit/s 15 fps video for wirelesstransmission and display on a 352×240 pel display by wireless MPEG-4video PDA 107. A/V transcoding unit 106 may use the transcoding hintsmetadata from buffer 105 to transcode, in real time, the compressedsource bit rate of the A/V content to the capabilities of each specifictarget A/V terminal 107, 108, 109, and 110. The transcoding hintsmetadata are generated in transcoding hints metadata extraction unit 104or they may be distributed by a broadcast service 101.

As shown in FIG. 1, a compressed bitstream in a source format(hereinafter “first bitstream”) 116 is transferred from A/V contentbuffer 103 to A/V transcoding unit 106. A bitstream in a target format(hereinafter “second bitstream”) 115 is transferred after transcoding intranscoding unit 106 to home network 113. From home network 113, contentin, e.g., compressed DV format is stored in A/V content storage 103 vialink 114.

FIG. 2 illustrates the transcoding hints extraction, transcoding hintsstorage, and transcoding process in accordance with an embodiment of theinvention. As shown in FIG. 2, a buffer 201 contains A/V content in asource format. A buffer 202 contains a description of the source format,such as bit rate, compression method, GOP structure, screen size,interlaced or progressive format, etc. A buffer 203 contains adescription of a target format, such as bit rate, compression method,GOP structure, screen size, interlaced or progressive format, etc. Atranscoding hints extraction unit 207 reads the A/V content incompressed source format from A/V buffer 201, as well as the sourceformat description from buffer 202 and the transcoding target formatdescription from buffer 203. After the transcoding hints are calculatedby transcoding hints extraction unit 207, the transcoding hints arestored in a transcoding hints metadata buffer 206. An A/V transcodingunit 205 reads first bitstream 204 in the source format from A/V contentbuffer 201 and transforms the source format into the target format bymeans of the transcoding hints metadata stored in buffer 206. A/Vtranscoding unit 205 outputs second bitstream 208 in the new compressedtarget format to an A/V target format buffer 209 for storage.

FIGS. 3 and 4 illustrate the principle of transcoding hints metadataorganization in accordance with an embodiment of the invention.MPEG-based video compression uses a predictable method, where changesbetween successive frames are encoded. Video content with a large numberof changes from one frame to the next frame requires (for maintainingthe subjective quality while limiting the bit rate) differentre-encoding parameter settings, than video content with small changesbetween frames. Therefore, it is important to decide in advance on there-encoding parameters. The transcoding hints metadata-selection mainlydepends on amount and characteristics of unpredictable visual content.The new visual content may not be predicted from previous frames and maybe bit rate intensive encoded using DCT-coefficients. As such, theinventive method uses the number of new feature points, which are nottracked from a previous frame to a current frame to determine the amountof new content per frame.

FIG. 3 depicts a graph of the number of new feature points per framedepending on the frame number of a video (horizontal axis, time axis).Section 301 is a part of a video segment where only a very small amountof new content appears between succeeding frames, and thereforerespective transcoding hints metadata (e.g., large GOP size, low framerate, low bit rate, . . . ) may be chosen. Section 302 includes aslightly higher number of new feature points per frame, which means thata state describing transcoding hints metadata is chosen, which providesoptimum transcoding parameters for this situation (e.g., slightlysmaller GOP size, higher bit rate). Section 303 depicts a transcodingmetadata hints state with a high number of new feature points per frame,and therefore a high amount of new content per scene. As such, a smallerM value (I/P-frame distance) and a higher bit rate are chosen.

FIG. 4 depicts an example of the basic organization of a transcodinghints metadata state diagram consisting of three discrete transcodinghints metadata states. Every discrete transcoding state may containmetadata for GOP structure, quantizer parameters, bit rate, screen size,etc. These transcoding hint parameters may have a fixed value or may bea function of another parameter. For example, the GOP length may be adiscrete function of the number of new feature points per frame and thequantizer parameters may be a function of the edge and texture activityderived from the DCT coefficients. Each of the three transcoding hintsmetadata states in this example may be selected to accommodate threedifferent encoding situations. As shown in FIG. 4, state “3” 403 isselected for a high amount of motion and low amount of new content perframe and represents the optimum state for transcoding hints metadatafor such content. State “2” 402 is selected for low amount of motion andhigh amount of content with high edge activity, which may require a highnumber of bits to be spent. State “1” 401 is, for example, selected toaccommodate the transcoding process for A/V content with low sceneactivity. There are also other special transcoding hint metadata statesprovided for video editing effects, like different crossfading effects,abrupt scene changes, or black pictures between two scenes. The locationof the video editing effects may be detected manually,semi-automatically, or fully automatically.

FIG. 5 illustrates the transcoding hints metadata extraction fromcompressed and uncompressed source content in accordance with anembodiment of the invention. As shown in FIG. 5, a system 500 includesan A/V source content buffer 501, a source format description buffer502, and a target format description buffer 503.

A memory 504 is included for storing the motion vector, DCT-coefficient,and feature point extraction from compressed or uncompressed domains. Inthe compressed domain, motion vector from P- and B-macroblocks may bedirectly extracted from a bitstream. However, there are no motionvectors, for Intra-macroblocks. Therefore, the motion vectors obtainedfor B- and P-macroblocks may be interpolated for I-macroblocks (see RoyWang, Thomas Huang: “Fast Camera motion Analysis in MPEG domain”, IEEEInternational Conference on Image Processing, ICIP 99, Kobe, Japan,October 1999). DCT coefficients for blocks of Intra-macroblocks may bedirectly extracted from a bitstream. For P- and B-macroblocks, a limitednumber of DCT-coefficients (DC and 2 AC coefficients) may be obtained bythe method described by Shih-Fu Chang, David G. Messerschmid:“Manipulation and Composition of MC-DCT compressed video”, IEEE Journalon Selected Areas in Communications, vol. 8, 1996. Exemplary methods ofcompressed domain feature point extraction and motion estimation isdisclosed in the patent by Peter Kuhn: “Method and Apparatus forcompressed domain feature point registration and motion estimation”, PCTpatent, December 1999, which is incorporated herein by reference. Insome cases, the A/V source content may only be available in uncompressedformat or in a compression format that is not based on the DCT andmotion compensation principle, which is employed by MPEG-1, MPEG-2,MPEG-4, ITU-T.H.261, and ITU-T H.263. For the DV format, it may be thecase that only the DCT-coefficients are available. In these cases motionvectors may be obtained by motion estimation methods, cf. e.g. PeterKuhn. “Algorithms, Complexity Analysis and VLSI Architectures for MPEG-4Motion Estimation”, Kluwer Academic A Publishers, 1999. DCT-coefficientsmay be obtained by performing a block-based DCT-transform, cf. K. R.Rao, P. Yip: “Discrete Cosine Transform—Algorithms, Advantages,Applications”, Academic Press 1990. Feature points in pel domain(uncompressed domain) may be obtained for example by the methoddescribed by Bruce D. Lucas, Takeo Kanade: “An iterative registrationtechnique with an application to stereo vision”, International JointConference on Artificial Intelligence, pp. 674-679, 1981.

A motion analysis part 505 extracts the parameters of a parametricmotion model from the motion vector representation in memory 504.Parametric motion models may have 6 and 8 parameters and parametricmotion estimation may be obtained by methods described in M. Tekalp:“Digital Video Processing”, Prentice Hall, 1995. The goal of using amotion representation is to eliminate the motion estimation in thetranscoder for delay and speed reasons. Therefore, the inputrepresentation of motion from the source bitstream may be used to derivethe output representation (target bitstream). For example, screen-sizeresizing, interlaced-progressive conversion, etc., may rely heavily onthe motion representation. The parameters of the motion representationmay also be used for coding decisions on GOP structure. A texture/edgeanalysis part 506 may be based on the DCT-coefficients extracted fromthe bitstream, e.g., K. R. Rao, P Yip: “Discrete CosineTransform—Algorithms, Advantages, Applications”, Academic Press 1990, orK. W. Chun, K. W. Lim, H. D. Cho, J. B. Ra: “An adaptive perceptualquantization algorithm for video encoding, IEEE Transactions on ConsumerElectronics, Vol. 39, No. 3, August 1993.

A feature point tracking part 507 for the compressed domain may employ atechnique described in Peter Kuhn. “Method and Apparatus for compresseddomain feature point registration and motion estimation”, PCT patent,December 1999, which is incorporated herein by reference. A processor510 calculates the number of new feature points per frame. A processor509 calculates the temporal video segmentation, and a processor 510calculates the transcoding hints state for every segment. Methods forthese calculations according to an embodiment of the invention will bedescribed in detail below with reference to FIG. 6, FIG. 7, and FIG. 8.

A memory 511 contains the motion-related transcoding hints metadata. Amemory 512 contains the texture/edge related transcoding hints metadata,and a memory 513 contains the feature point transcoding hints metadata,all of which will be described in detail below with reference to FIG.15. A memory 514 contains video segment transcoding hints selectionmetadata, which will be described with reference to FIG. 16. Theautomatic extraction, compact representation, and usage of thetranscoding hints metadata will now be described.

FIG. 6 discloses a video segmentation and transcoding hints stateselection process in accordance with an embodiment of the invention. Atstep 601, some variables are initialized. The variable “frame” is thecurrent frame number of the source bitstream, and “nframes” is thenumber of frames within the new video segment (or GOP, group ofpictures). The other variables are only of use within this routine. Atstep 602, the number of frames within the GOP is incremented. At step603, it is determined whether a new segment/GOP starts within the frame,details of which will be discussed in detail with reference to FIG. 7.If so (“yes”), control is passed to step 604, otherwise, it is passed tostep 615. At step 604, the variable “last_gop_start” is initialized withthe value of “new_gop_start”. At steps 608 and 609, the variable“last_gop_stop” is set to “frame-1” if the variable “frame” is largerthan 1. Otherwise, at step 610, “last_gop_stop” is set to 1. Next, atstep 611, which is depicted in detail in FIG. 8, determines thetranscoding hints state based on motion parameters 605, texture/edgeparameters 606, and feature-point data 607. At step 612, the transcodinghints metadata are output to the transcoding hints metadata buffers. Inaccordance with an embodiment of the invention, the transcoding hintsmetadata comprises “nframes” (number of frames within the GOP), thetranscoding hints state with all the parameters, and the start framenumber of the new GOP (“new_gop_start”). After that, the variable“nframes” is set to 0 and the current frame number “frame” is given tothe variable “new_gop_start”. Then, at step 615, it is tested todetermine if all frames of the source bitstream have been processed. Ifnot (“no”), control is passed to step 614 where the frame number isincremented and the process is repeated starting from step 602.Otherwise, the process is terminated.

FIG. 7 illustrates a method for determining the start frame and the endframe of a new video segment or GOP according to an embodiment of theinvention. At step 701, it is determined whether the variable “nframes”from FIG. 6 is an integer multiple of M (which is the I/P framedistance). If so, then “no” is selected and at step 702, it isdetermined whether the current frame number is the first frame. If so(“no”), control is passed to step 703 where it is determined whether“nframes” is greater than a minimum number of frames “gop_min” within aGOP. In case the result at step 702 is “yes”, a new GOP is started atstep 705. In case the result at step 703 is “yes”, a new GOP is startedat step 705. In case the result at step 703 is “no”, control is passedto step 704 where it is determined whether “nframes” is greater than amaximum number of frames “gop_max” within a GOP. In case the result atstep 704 is “yes”, the GOP is closed at step 706, otherwise, the processis terminated.

FIG. 8 illustrates a process for selecting a transcoding hint state fora specific GOP or A/V segment taking only the number of new featurepoints per frame into account in accordance with an embodiment of theinvention. Based on the basic idea illustrated, similar decisionstructures may be implemented using the aforementioned motion parametersfrom a parametric motion estimation as well as texture/edge parametersgained from DCT-coefficients. It is noted that the class or algorithmsdescribed may also be used to classify A/V material in terms of motion,edge activity, new content per frame, etc., leading to a higher level ofA/V classification. In such cases, the transcoding hint states wouldrepresent specific classes of different content material. Referring nowto FIG. 8, at step 801, variables “frame_no”, “last_gop_start”, “sum”and “new_seg” are initialized. The variable “frame no” is given thecontents of the “last_gop_start” parameter, and the variables “sum” and“new_seg” are initialized with zero. Then, at step 802, the contents ofthe variable “sum” is incremented by the number of new feature points ofthe current frame (“frame_no”). At step 803, it is determined whetherthe variable “frame_no” is less than the variable “last_gop_stop”. If so(“yes”), step 802 is repeated, otherwise, control is passed to step 804.At step 804, it is determined whether the value of the variable “sum” isless than one-eight of a predetermined parameter “summax”. The parameter“summax” is a constant that represents the maximum number of featurepoints that can be tracked from frame to frame multiplied by the numberof frames between the frames “last_gop_start” and “last_gop_stop”. Itmay have the value 200 according to an embodiment of the invention. Ifthe result at step 804 is “yes”, the transcoding hints state 1 isselected at step 806 for which the parameters are shown in Table 1 ofFIG. 8. Otherwise, at step 805, it is determined whether the value ofthe variable “sum” is less than one-quarter of the predeterminedparameter “summax”. If so (“yes”), the transcoding hints state 2, asshown in Table 1 is selected at step 807. If not (“no”), the transcodinghints state 3 (as shown in Table 1) is selected at step 808 and theprocess is terminated. It is noted that the decision thresholds in steps804 and 805 depend on the definition and number of transcoding hintsstates.

Transcoding Hints Metadata Description

For metadata explanation, a pseudo C-code style may be used.Abbreviations D for Description and DS for Description Schemes, asdefined in the emerging MPEG-7 metadata standard, may be used.

FIG. 9 depicts a structural organization of transcoding hints metadatawithin a Generic A/V DS 901 in accordance with an embodiment of theinvention. As shown in FIG. 9, Segment DS 904 and Media Info DS 902 arederived from Generic A/V DS 901. Segment Decomposition 906 is derivedfrom Segment DS 904, and Video Segment DS 907 and Moving Region DS 907are derived from Segment Decomposition 906. Segment-based transcodinghints DS 909, which will be described in detail with reference to FIG.14, is derived from Video Segment DS 907. Video Segment DS 907 accessesone or several transcoding hint state DS 911, which will be described indetail with reference to FIG. 16. From Moving Region DS 908, theSegment-based transcoding hints DS 910, which will be described indetail with reference to FIG. 14, for moving regions is derived, whichaccesses one or several transcoding hint state DS 912, which will bedescribed in detail with reference to FIG. 16. From Media Info DS 902,Media Profile DS 903 is derived. From Media Profile DS 903, GeneralTranscoding Hints DS 905, which will be described with reference to FIG.10, is derived.

FIG. 10 depicts the structural organization of Transcoding Hints DS1001, which consists of one instance of the Source Format Definition DS1002, which will be described with reference to FIG. 11, one or severalinstances of target format definition DS 1003 which will be describedwith reference to FIG. 12. Additionally, Transcoding Hints DS 1001consists of one optional instance of General Transcoding Hints DS 1004,which will be described with reference to FIG. 13, and one optionalTranscoding all Encoding Complexity DS 1005, which will be describedwith reference to FIG. 15.

FIG. 11 depicts source format definition transcoding hints metadata(e.g., Source Format Definition DS 1002 in FIG. 10) which is associatedto the whole A/V content or to a specific A/V segment, in accordancewith an embodiment of the invention. As shown in FIG. 11, relevantDescriptors and Description Schemes may include:

-   -   bitrate is of type <int> and describes the bit rate per second        of the source A/V data stream.    -   size_of_pictures is of type <2*int> and describes the size of        picture of the source A/V format in x and y directions.    -   number_of_frames_per_second is of type <int> and describes the        number of frames per second of the source content.    -   pel_aspect_ratio is of type <float> and describes the pel aspect        ratio.    -   pel_colour_depth is of type <int> and describes the color depth.    -   usage_of_progressive_interlaced_format is of size <1 bit> and        describes whether the source format is in progressive or in        interlaced format.    -   usage_of_frame_field_pictures is of size <1 bit> and describes        whether frame or field pictures are used.    -   compression method is of type <int> and defines the compression        method used for the source format and may be selected from a        list that includes: MPEG-1, MPEG-2, MPEG-4, DV, H.263, H,261,        etc. For every compression method, further parameters may be        defined here.    -   GOP_structure is a run-length-encoded data field of the I, P,        B-states. For example, in case there are only I-frames in an        MPEG-2 video, direct conversion to the DV format in compressed        domain is possible.

FIG. 12 depicts target format definition transcoding hints metadata,which may be associated to the whole A/V content or to a specific A/Vsegment, in accordance with an embodiment of the invention. As shown inFIG. 12, the relevant Descriptors and Description Schemes may include:

-   -   bitrate is of type <int> and describes the bit rate per second        of the target A/V data stream.    -   size_of pictures is of type <2*int> and describes the size of        picture of the target A/V format in x and y directions.    -   number_of_frames_per_second is of type <int> and describes the        number of frames per second of the target content.    -   pel_aspect_ratio is of type <float> and describes the pel aspect        ratio.    -   pel_colour_depth is of type <int> and describes the color depth.    -   usage_of_progressive_interlaced_format is of size <1 bit> and        describes whether the target format needs to be progressive or        interlaced.    -   usage_of_frame_field_pictures is of size <1 bit> and describes        whether frame or field pictures are used.    -   compression_method is of type <int> and defines the compression        method used for the target format and may be selected from a        list that includes: MPEG-1, MPEG-2, MPEG-4, DV, H.263, H.261,        etc. For every compression method, further parameters may be        defined here.    -   GOP_structure is an optional run-length-encoded data field of        the I, P, B-states. With this optional parameter, a fixed GOP        structure may be forced. A Fixed GOP structure may be useful,        for example, to force I-frames at certain locations to        facilitate video editing.

FIG. 13 depicts general transcoding hints metadata (e.g., GeneralTranscoding Hints DS 1004 in FIG. 11), which may be associated to thewhole A/V content or to a specific A/V segment, according to anembodiment of the invention. As shown in FIG. 13, relevant Descriptorsand Description Schemes may include:

-   -   use_region_of_interest_DS has a length of <1 bit> and indicates        whether a region of interest description scheme is available as        transcoding hints.    -   In case the region_of_interest_DS is used, then a shape_D (which        may be for example one of the following: boundary_box_D,        MB_shape_D, or any other shape_D) together with a        motion_trajectory_D may be used to spatially and temporally        describe the region of interest. An MB_shape_D may use        macroblock (16×16) sized blocks for object shape description.        Motion_trajectory_D already includes a notion of time so that        the start frame and the end frame of the region_of_interest_DS        may be defined. The region_of_interest_DS may have the size of        the respective shape_D and the respective motion_trajectory_D.        For transcoding applications, the region_of_interest_DS may be        used, for example, to spend more bits (or modify the quantizer,        respectively) for the blocks within the region of interest than        for the background. Another transcoding application to MPEG-4        may be to describe the region of interest by a separate MPEG-4        object and to spent a higher bit rate and a higher frame rate        for the region of interest than for other MPEG-4 objects like        the background. The extraction of the region_of_interest_DS may        be performed automatically or manually.    -   use_editing_effects_transcoding_hints_DS has a length of <1 bit>        and indicates if information is available on        editing-effects-based transcoding hints.    -   camera_flash is a list of entries where every entry describes        the frame number where a camera flash occurs. Therefore, the        length of the descriptor is the number of camera flash events        multiplied by <int>. For transcoding applications, the        camera_flash descriptor is very useful, as most of the video        (re-) encoders/transcoders use a motion estimation method based        on the luminance difference, c.f. Peter Kuhn: “Algorithms,        Complexity Analysis and VLSI Architectures for MPEG-4 motion        estimation”, Kluwer Academic publishers, 1999. In case of a        luminance-based motion estimation, the mean absolute error        between two macroblocks of two subsequent frames (one with        flash, one without flash) would be too high for prediction and        the frame with the camera flash would have to be encoded as        Intra-frame with high bit rate costs. Therefore, indicating the        camera flash within a transcoding hints Description Scheme        (“DS”), allows for using, for example, a luminance corrected        motion estimation method or other means to predict the frame        with the camera flash from the anchor frame(s) with moderate bit        costs.    -   cross_fading is a list of entries where every entry describes        the start frame and the end frame of a cross fading. Therefore,        the length of this descriptor is two times <int> of the number        of cross fading events. Indicating the cross fading events in        transcoding hints metadata is very useful for controlling the        bit rate/quantizer during the cross fading. During cross fading,        prediction is generally of limited use causing a bit rate        increase for prediction error coding. As during cross fading,        the scene is usually blurred, the bit rate increase may be        limited by adjusting the quantizer scale, bit rate, or rate        control parameters, respectively.    -   black_pictures is a list of entries where every entry describes        the start frame and the end frame of a sequence of black        pictures. Between scenes, especially in home video, black        pictures may occur. Experimentally, results indicate that a        series of black pictures increases the bit rate in        motion-compensated DCT coders because the prediction is only of        limited use. Therefore, this transcoding hints descriptor may be        used to limit the bit rate during black pictures by adjusting        the quantizer scale, bit rate, or rate control parameters,        respectively.    -   fade_in is similar to cross_fading, and is described as a number        of entries determining the start frame and the end frame of a        fade in. In comparison to cross fading, the fade in starts from        black pictures, and, therefore, a kind of masking effect of the        eye may be used to limit the bit rate during fade in by        adjusting the quantizer_scale, bit rate, or rate control        parameters, respectively.    -   fade_out is similar to fade_in, except that after a scene, a        series of black pictures are described.    -   abrupt_change is described by a list of single frame numbers of        type <int> indicating where abrupt scene or shot changes without        fading appear. These events are indicated, for example, by the        very high and sharp peaks in FIG. 3. These peaks indicate the        beginning of a new camera shot or scene. The abrupt_change        editing effect is in contrast to the fading effects. When abrupt        changes between two video segments appear, then the human visual        perception needs a few milliseconds to adapt and recognize the        details of the new A/V segment. This slowness effect of the        human eye may be used beneficially for video transcoding, for        example, for reducing the bit rate or modifying the quantizer        scale parameters for the first frames of a video segment after        an abrupt change of a scene or shot.    -   use_motion_transcoding_hints_DS has a length of <1 bit> and        indicates the use of motion-related transcoding hints metadata.    -   number of regions indicates the number of regions for which the        following motion-related transcoding hints metadata are valid.    -   for_every_region is indicated by a field of <1 bit> length,        whether the region is rectangular or arbitrarily-shaped. In case        the region is arbitrarily-shaped, a region descriptor        (consisting, e.g., of a shape descriptor and a motion trajectory        descriptor) is used. In case of a rectangular region, the size        of the rectangular region is used. The motion field within this        region is described by a parametric motion model, which is        determined by several parameters for every frame or sequence of        frames. For transcoding, this motion representation of the real        motion of the source video may be used to limit the search area        of the computational complex motion estimation of the (re-)        encoding part, and also for fast and efficient        interlaced/de-interlaced (frame/field) conversion and        determining the GOP (Group of Pictures) structure depending on        the amount of motion within the video. The motion representation        may also be used beneficially for size conversion of the video.

FIG. 14 depicts the segment-based transcoding hints metadata (e.g.,segment-based transcoding hints DS 909 and 910 in FIG. 9) which may beused to determine the (re-) encoder/transcoder settings for an A/Vsegment which depicts constant characteristics, in accordance with anembodiment of the invention. As shown in FIG. 14, relevant Descriptorsand Description Schemes may include:

-   -   start_frame is of type <int> and describes the frame number of        the beginning of the transcoding hints metadata of an A/V        segment.    -   nframes is of type <int> and describes the length of an A/V        segment.    -   I_frame_location gives several possibilities for describing the        location of I-frames within an A/V segment.    -   select_one_out_of_the_following is of size <2 bit> and selects        one of the following four I-frame location description methods.    -   first frame is of size <1 bit> and is the default I-frame        location. This method describes an A/V segment where only the        first frame is an Intra frame of the A/V segment and is used as        an anchor for further prediction and all other frames within the        A/N segment are P- or B-frames.    -   List of frames gives a list of frame numbers of Intra-frames        within an A/V segment. This method allows for arbitrarily        describing the location of Intra-frames within an A/V segment.        For k frames within this list, the size of this descriptor is        <k*int>.    -   first_frame_and_every_k_frames is of type <int>, where the first        frame within a segment is Intra and k describes the interval of        I-frames within the A/V segment.    -   no_I_frame is of size <1 bit> and describes the case where no        I-frame is used within an A/V segment, which is useful when the        encoding of the A/V segment is based on an anchor (Intra-frame)        in a previous segment.    -   quantizer_scale is of type <int> and describes the initial        quantizer scale value for an A/V segment.    -   target_bitrate is of type <int> and describes the target bit        rate per second for an A/V segment.    -   target_min_bitrate is of size <int> and describes the minimum        target bit rate per second for an A/V segment (optional).    -   target_max_bitrate is of size <int> and describes the maximum        target bit rate per second for an A/V segment (optional).    -   use_transcoding_states is of size <1 bit> and describes whether        transcoding hint states are used for an A/V segment.    -   transcoding_state_nr is of type <int> and gives the transcoding        hint metadata state for a segment. The transcoding hint metadata        state is a pointer to an entry in a table of transcoding hint        states. The table of transcoding hint states may have several        entries, where new entries may be added or deleted by        transcoding hints parameters. The transcoding hints metadata of        a single transcoding hint state will be described with reference        to FIG. 16.    -   add_new_transcoding_state is of size <1 bit> and describes        whether a new transcoding state with associated information has        to be added to the transcoding hints table. In case the        add_new_transcoding_state signals “yes”, a list of parameters of        the new transcoding hints state is given. The size of the        parameter list is determined by the number of parameters of one        transcoding hints state and the number of transcoding hints        state.    -   remove_transcoding_state is a flag of size <1 bit> indicating        whether a transcoding state may be removed or not. In case a        transcoding state may be removed, the state number (type: <int>)        of the transcoding state to be removed is given.    -   use_encoding_complexity_description is of size <1 bit> and        signals whether a more detailed encoding complexity description        scheme as defined in FIG. 15 has to be used.

FIG. 15 depicts the coding complexity transcoding hints metadata, whichmay be associated to the whole A/V content or to a specific A/V segment,according to an embodiment of the invention. Encoding complexitymetadata may be used for rate control and determines the quantizer andbit rate settings.

-   -   use_feature_points is of size <1 bit> and indicates the use of        feature point based complexity estimation data.    -   select_feature_point_method is of size <2 bits> and selects the        feature point method.    -   number_of_new_feature_points per frame describes a list of the        number of new feature points per frame as indicated in FIG. 3,        and which are of size <nframes*int>. This metric indicates the        amount of new content per frame.    -   feature_point_metrics describes a list of metrics based on the        new feature points per frame within one segment. The metrics are        represented as an ordered list of <int> values with the        following meaning: mean, max, min, variance, standard deviation        of the number of the new feature points per frame.    -   use_equation_description is an <int> pointer to an        equation-based description of the encoding complexity per frame.    -   use_motion_description is of size <1 bit> and indicates the use        of a motion-based complexity description.    -   select_motion_method is of size <4 bits> and selects the motion        description method.    -   param_k_motion is of size <nframes*k*int> and describes the k        parameters for every single frame of a global parametric motion        model.    -   motion_metrics describes a list of metrics for the whole        segment-based on the size of the motion vectors. The metrics are        represented as an ordered list of <int> values with the        following meaning: mean, max, min, var, stddev of the macroblock        motion vectors.    -   block_motion_field describes every vector of an m*m block sized        motion field and is of size <nframes*int*size_x*size_y/(m*m)>.    -   use_texture_edge_metrics is a flag that is set when texture or        edge metrics are used and it is of size <1 bit>.    -   select_texture_edge_metrics is of size <4 bits> and it        determines which texture metric from the following is used.    -   DCT_block_energy is the sum of all DCT-coefficients of one block        and is defined for every block within a frame. It is of size        <size_y*size-X*nframes*int/64>.    -   DCT_block_activity is defined as the sum of all DCT-coefficients        of one block but without the DC coefficient. It is defined for        every block within a frame and is of size        <size_y*size_x*nframes*int/64>    -   DCT_energy_metric describes a list of metrics for the whole        segment-based on the individual DCT energies of each block. The        metrics are represented as an ordered list of <int> values with        the following meaning: mean, max, min, variance, standard        deviation of all the individual DCT energy metrics. The size of        the descriptor is <6*int>. An alternative implementation of this        descriptor is to describe the DCT energy metric for every single        frame of the video segment.    -   DCT_activity_metric describes a list of metrics for the whole        segment-based on the individual DCT activities of each block.        The metrics are represented as an ordered list of <int> values        with the following meaning: mean, max, min, variance, standard        deviation of all the individual DCT activity metrics. The size        of the descriptor is <6*int>. An alternative implementation of        this descriptor is to describe the DCT activity metric for every        single frame of the video segment.

FIG. 16 depicts the transcoding hints state metadata, which may beassociated to the whole audio-visual content or to a specific A/Vsegment according to an embodiment of the invention. RelevantDescriptors and Description Schemes may include:

-   -   M is of type <int> and describes the I-frame/P-frame distance.    -   bitrate_fraction_for_I is of type <float> and describes the        fraction of the bit rate defined for an A/V segment that is        available for I frames.    -   bitrate_fraction_for P is of type <float> and describes the        fraction of the bit rate defined for an A/V segment that may be        used for P frames. The bit rate fraction for B-frames is the        rest of the percentage to 100%.    -   quantizer_scale_ratio_I_P is of type <float> and denotes the        relation of the quantizer scale (as defined for this segment)        between I- and P-frames.    -   quantizer_scale_ratio_I_B is of type <float> and denotes the        relation of the quantizer scale (as defined for this segment)        between I- and B-frames. It is noted that either the bit rate        descriptors Bitrate_fraction for_I<bitrate_fraction_for_P), the        quantizer_scale_ratio descriptors (quantizer_scale_ratio_I_P,        quantizer_scale_ratio_I_B) or the following rate-control        parameters may be mandatory.    -   X_I, X_P, X_B are frame_vbv_complexities and are each of type        <int> and are defined in case of frame based compression target        format (cf., FIG. 12). These and the following Virtual Buffer        Verifier (“VBV”) complexity adjustments may be optional and may        be used to modify the rate control scheme according to the        source content characteristics and the target format definition.    -   X_I top, X_P top, X B top are field_vbv_complexities for the top        field and are each of type <int> and are defined in case of        field based compression target format (cf. FIG. 12).    -   X_I bot, X_P_bot, X_B_bot are field_vbv_complexities for the        bottom field and are each of type <int> and are defined in case        of field based compression target format (cf. FIG. 12).

It will thus be seen that the objects set forth above, among those madeapparent from the preceding description, are efficiently attained and,because certain changes may be made in carrying out the above method andin the construction(s) set forth without departing from the spirit andscope of the invention, it is intended that all matter contained in theabove description and shown in the accompanying drawings shall beinterpreted as illustrative and not in a limiting sense.

It is also to be understood that the following claims are intended tocover all of the generic and specific features of the invention hereindescribed and all statements of the scope of the invention which, as amatter of language, might be said to fall therein.

1. A video signal processing method for processing video signalscontaining video material by a server, the server executing instructionsto perform the method comprising: separating video material from thevideo signals into segments, the video material comprising a pluralityof frames; calculating a number of new feature points per frame of thevideo material; determining, for each segment, whether the number of newfeature points exceeds a threshold value; selecting, based on thedetermination, one of a plurality of transcoding hints states; andtranscoding the video material according to the selected one of aplurality of transcoding hints states.
 2. A video signal processingmethod according to claim 1, wherein the step of separating the videomaterial into segments comprises the steps of: using feature points withassociated motion vectors; tracking the feature points; and determininga new video segment for transcoding based on a number of the featurepoints that could not be tracked from one frame to a next frame.
 3. Avideo signal processing method according to claim 1, wherein the step oftranscoding the video material comprises the steps of: receiving a firstbitstream of compressed image data having a first GOP structure;extracting transcoding hints metadata from the first bitstream;utilizing the transcoding hints metadata associated to the firstbitstream to facilitate transcoding; and outputting a second bitstream.4. A video signal processing method according to claim 3, wherein thestep of transcoding the Video material further comprises the step ofutilizing the transcoding hints metadata associated to temporal segmentsof the first bitstream to facilitate transcoding.
 5. A video signalprocessing method according to claim 3, wherein the step of transcodingthe video material further comprises the step of utilizing thetranscoding hints metadata associated to spatial segments of the firstbitstream to facilitate transcoding.
 6. A video signal processing methodaccording to claim 3, wherein the step of transcoding the video materialfurther comprises the step of controlling a bit rate of the secondbitstream so that a bit rate of the first bitstream is different fromthe bit rate of the second bit stream.
 7. A video signal processingmethod according to claim 6, wherein the step of transcoding the Videomaterial further comprises the step of adjusting a size of picturesrepresented by the first bitstream so that pictures represented by thesecond bitstream exhibits a size different from the size of the picturesrepresented by the first bitstream.
 8. A video signal processing methodaccording to claim 3, wherein the step of transcoding the video materialfurther comprises the step of adjusting a size of pictures representedby the first bitstream so that pictures represented by the secondbitstream exhibit a size different from the size of the picturesrepresented by the first bitstream.
 9. A video signal processing methodaccording to claim 8, wherein the step of transcoding the video materialfurther comprises the step of encoding the pictures represented by thesecond bitstream as field pictures when the pictures represented by thefirst bitstream are encoded as frame pictures.
 10. A video signalprocessing method according to claim 8, wherein the step of transcodingthe video material further comprises the step of interlacing thepictures represented by the first bitstream when the picturesrepresented by the first bitstream are received as a progressivesequence so that the pictures represented by the second bitstream areoutput as an interlaced sequence.
 11. A video signal processing methodaccording to claim 3, wherein the step of transcoding the video materialfurther comprises the step of encoding pictures represented by thesecond bitstream as field pictures when pictures represented by thefirst bitstream are encoded as frame pictures.
 12. A videosignal-processing method according to claim 3, wherein the step oftranscoding the video material further comprises the step of interlacingpictures represented by the first bitstream when pictures represented bythe first bitstream are received as a progressive sequence so thatpictures represented by the second bitstream are output as an interlacedsequence.
 13. A video signal processing method according to claim 1,further comprising the steps of: describing transcoding target bitstreamparameters; extracting transcoding hints metadata; and storing thetranscoding hints metadata.
 14. A video signal processing methodaccording to claim 13, wherein the step of describing the transcodingtarget bitstream parameters comprises the steps of: defining a bit rateof a second bitstream of compressed images; defining a size of picturesof the second bitstream of compressed images; defining a number offrames per second of the second bitstream of compressed images; definingan aspect ratio of a pel of the second bitstream of compressed images;defining a color depth of each of the pel of the second bitstream ofcompressed images; defining whether progressive format is used for thesecond bitstream of compressed images; defining whether interlacedformat is used for the second bitstream of compressed images; definingwhether frame pictures are used for the second bitstream of compressedimages; defining whether field pictures are used for the secondbitstream of compressed images; and defining a compression method of thesecond bitstream of compressed images.
 15. A video signal processingmethod according to claim 14, wherein the step of describing thetranscoding target bitstream parameters further comprises the step ofdefining employed compression standards as defined by MPEG (MovingPictures Expert Group).
 16. A video signal processing method accordingto claim 14, wherein the step of describing the transcoding targetbitstream parameters further comprises the step of defining employedcompression standards as defined by ITU-T (InternationalTelecommunications Union Technical Standards Group).
 17. A video signalprocessing method according to claim 13, wherein the step of extractingthe transcoding hints metadata comprises the steps of: receiving a firstbitstream of compressed image data having a first GOP structure;obtaining first motion information from the first bitstream; obtainingtexture/edge information of a first segmentation; obtaining featurepoints and associated motion information from the first bitstream; andobtaining region of interest information from the first bitstream.
 18. Avideo signal processing method according to claim 17, wherein the stepof extracting the transcoding hints metadata further comprises the stepof storing the first motion information as transcoding hints metadata.19. A video signal processing method according to claim 17, wherein thestep of extracting the transcoding hints metadata further comprises thestep of representing motion-related transcoding hints metadata asparameters of a parametric motion model.
 20. A video signal processingmethod according to claim 19, wherein the step of extracting thetranscoding hints metadata further comprises the step of employing theparametric motion model to describe a global motion within subsequentrectangular video frames.
 21. A video signal processing method accordingto claim 19, wherein the step of extracting the transcoding hintsmetadata further comprises the step of employing the parametric motionmodel to describe a motion within a defined region of arbitrary shape.22. A video signal processing method according to claim 21, wherein theparametric motion model is employed to describe the motion within thedefined region of arbitrary shape as used within MPEG-4.
 23. A videosignal processing method according to claim 17, wherein the step ofextracting the transcoding hints metadata further comprises the step ofrepresenting motion-related transcoding hints metadata as an array ofmotion vectors contained in the first bitstream of the compressed imagedata.
 24. A video signal processing method according to claim 17,wherein the step of extracting the transcoding hints metadata furthercomprises the step of representing motion-related transcoding hintsmetadata as an array of motion vectors derived from motion vectorscontained in the first bitstream of the compressed image data.
 25. Avideo signal processing method according to claim 17, wherein the stepof extracting the transcoding hints metadata further comprises the stepof representing motion-related transcoding hints metadata as a list offeature points with associated motion vectors, which are tracked withinsubsequent frames.
 26. A video signal processing method according toclaim 17, wherein the step of extracting the transcoding hints metadatafurther comprises the step of representing motion-related transcodinghints metadata as a list of feature points with associated motionvectors, which are tracked within arbitrarily shaped regions, withinsubsequent frames.
 27. A video signal processing method according toclaim 17, wherein the step of extracting the transcoding hints metadatafurther comprises the step of representing texture-related transcodinghints metadata as one of a list of DCT-coefficients and a measure (oneof mean, minimum, maximum, variance, and standard deviation) derivedthereof.
 28. A video signal processing method according to claim 17,wherein the step of extracting the transcoding hints metadata furthercomprises the step of representing edge-related transcoding hintsmetadata as one of a list of DCT-coefficients and a measure derivedthereof, the measure being one of mean, minimum, maximum, variance, orstandard deviation.
 29. A video signal processing method according toclaim 17, wherein the step of extracting the transcoding hints metadatafurther comprises the step of representing the feature points andassociated motion-related transcoding hints metadata as a list.
 30. Avideo signal processing method according to claim 17, wherein the stepof extracting the transcoding hints metadata further comprises the stepof representing encoding-complexity-related transcoding hints metadataas a complexity metric derived from the feature points tracked withinsubsequent frames by using a number of lost and new feature points fromone frame to a next frame.
 31. A video signal processing methodaccording to claim 13, wherein the step of storing the transcoding hintsmetadata comprises the step of maintaining a buffer containingtranscoding hints metadata for several situations.
 32. A video signalprocessing method according to claim 31, wherein the step of storing thetranscoding hints metadata further comprises the step of storingindividual general transcoding hints metadata for several targetdevices.
 33. A video signal processing method according to claim 31,wherein the step of storing the transcoding hints metadata furthercomprises the step of storing general transcoding hints metadata forvideo segments of varying scene activity.
 34. An apparatus forprocessing supplied video signals, comprising: a segmenting unit forseparating video material into segments, the video material comprising aplurality of frames; a calculation unit for calculating a number of newfeatures points per frame of the video material; a determination unitfor determining, for each segment, whether the number of new featurepoints exceeds a threshold value; a selection unit for selecting, basedon the determination, one of a plurality of transcoding hints states;and a transcoding unit for transcoding the video material according tothe selected one of a plurality of transcoding hints states.
 35. Theapparatus of claim 34, further comprising: a target buffer for storingat least one description of transcoding target bitstream parameters; anextraction unit for extracting transcoding hints metadata based on theat least one description; a buffer for storing the transcoding hintsmetadata.